Introduction

Electric vehicles are considered to be a strong contributor to reducing carbon emissions. In this project, data on electric vehicles were selected for analysis and research.

Data were collected by the China Electric Vehicle Association and include data from March 1 to March 8, 2022 for the NIO branded electric vehicle ES6 models. We take the data as a proxy for EV across China. Variables used in this analysis mainly include

  • AirCondMod: Variable for air conditioning mode, where Heating no use OHX=3, Cooling=2, Heating use OHX=1, Off=0
  • region_code: China’s administrative region code
  • AmdTemp: Ambient temperature
  • vehicle_num: Number of vehicles
  • CCU_FrntLeTempSet: Front left temper set
## Rows: 564,584
## Columns: 21
## $ AirCondMod        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ date              <chr> "2022/3/2", "2022/3/2", "2022/3/2", "2022/3/2", "202…
## $ hour              <int> 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, …
## $ region_code       <int> 610703, 610722, 610723, 610725, 610726, 610729, 6108…
## $ model_type        <chr> "ES6", "ES6", "ES6", "ES6", "ES6", "ES6", "ES6", "ES…
## $ vehicle_num       <int> 9, 4, 1, 4, 2, 1, 1, 1, 3, 1, 2, 1, 1, 2, 1, 1, 12, …
## $ AmdTemp           <dbl> 14.92, 15.13, 15.50, 15.10, 13.25, 17.00, 10.50, 10.…
## $ weather           <int> 4, 4, 4, 4, 4, 4, 0, 0, 0, 9, 0, 0, 0, 9, 9, 9, 0, 0…
## $ humidity          <int> 40, 40, 40, 40, 40, 40, 10, 10, 28, 15, 28, 28, 28, …
## $ pm25              <dbl> 41, 41, 41, 41, 41, 41, 18, 18, 38, 26, 38, 38, 36, …
## $ ComprActPwr_kwh   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ PtcTotActPwr_kwh  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ PwrActOfChi_kwh   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ trip_eng_kwh      <dbl> 1.592, 3.175, 9.100, 2.240, 7.200, 0.000, 1.000, 0.5…
## $ IntrTemp          <dbl> 22.09, 22.48, 22.58, 22.16, 22.95, 17.38, 20.86, 21.…
## $ CCU_FrntLeTempSet <dbl> 24.38, 24.63, 22.00, 25.30, 26.50, 29.00, 28.00, 31.…
## $ CCU_FrntRiTempSet <dbl> 25.12, 24.88, 22.00, 25.50, 26.50, 28.00, 28.00, 31.…
## $ CCU_FrntBlwSpd    <dbl> 1.16, 1.25, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00…
## $ duration_s        <dbl> 575.92, 688.75, 835.00, 835.20, 2247.00, 15.00, 1320…
## $ mileage_km        <dbl> 7.65, 14.50, 30.00, 13.30, 37.60, 0.00, 4.40, 2.10, …
## $ VehSpd_kph        <dbl> 38.28, 77.78, 130.05, 60.34, 68.86, 1.39, 12.79, 9.0…
##    AirCondMod        date                hour        region_code    
##  Min.   :0.000   Length:564584      Min.   : 0.00   Min.   :110101  
##  1st Qu.:0.000   Class :character   1st Qu.: 9.00   1st Qu.:320581  
##  Median :1.000   Mode  :character   Median :13.00   Median :360481  
##  Mean   :1.327                      Mean   :13.19   Mean   :362876  
##  3rd Qu.:2.000                      3rd Qu.:18.00   3rd Qu.:440515  
##  Max.   :3.000                      Max.   :23.00   Max.   :659008  
##                                                                     
##   model_type         vehicle_num         AmdTemp          weather      
##  Length:564584      Min.   :   1.00   Min.   :-40.00   Min.   : 0.000  
##  Class :character   1st Qu.:   1.00   1st Qu.: 10.07   1st Qu.: 0.000  
##  Mode  :character   Median :   2.00   Median : 14.00   Median : 4.000  
##                     Mean   :  13.37   Mean   : 13.99   Mean   : 5.067  
##                     3rd Qu.:   8.00   3rd Qu.: 18.38   3rd Qu.: 9.000  
##                     Max.   :1328.00   Max.   : 38.50   Max.   :31.000  
##                                       NA's   :11                       
##     humidity        pm25            ComprActPwr_kwh  PtcTotActPwr_kwh 
##  Min.   :  0   Min.   :-100000000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.: 32   1st Qu.:        20   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median : 52   Median :        34   Median :0.0000   Median :0.00100  
##  Mean   : 53   Mean   :   -406811   Mean   :0.0257   Mean   :0.03685  
##  3rd Qu.: 74   3rd Qu.:        51   3rd Qu.:0.0260   3rd Qu.:0.03200  
##  Max.   :100   Max.   :       279   Max.   :1.7680   Max.   :4.14000  
##                                                                       
##  PwrActOfChi_kwh     trip_eng_kwh        IntrTemp      CCU_FrntLeTempSet
##  Min.   :0.000000   Min.   :-76.800   Min.   :-20.42   Min.   :15.00    
##  1st Qu.:0.000000   1st Qu.:  0.400   1st Qu.: 19.73   1st Qu.:22.60    
##  Median :0.000000   Median :  0.797   Median : 21.91   Median :24.00    
##  Mean   :0.000157   Mean   :  1.234   Mean   : 21.42   Mean   :23.90    
##  3rd Qu.:0.000000   3rd Qu.:  1.400   3rd Qu.: 23.55   3rd Qu.:25.07    
##  Max.   :0.961000   Max.   : 32.000   Max.   : 45.32   Max.   :31.00    
##                     NA's   :6                                           
##  CCU_FrntRiTempSet CCU_FrntBlwSpd    duration_s       mileage_km      
##  Min.   :15.00     Min.   :1.000   Min.   :   0.0   Min.   :-350.400  
##  1st Qu.:22.53     1st Qu.:1.570   1st Qu.: 280.0   1st Qu.:   1.100  
##  Median :23.92     Median :2.620   Median : 517.7   Median :   3.090  
##  Mean   :23.83     Mean   :2.733   Mean   : 585.2   Mean   :   4.619  
##  3rd Qu.:25.00     3rd Qu.:3.500   3rd Qu.: 752.5   3rd Qu.:   6.000  
##  Max.   :31.00     Max.   :9.000   Max.   :3599.0   Max.   :  97.700  
##                                                     NA's   :6         
##    VehSpd_kph    
##  Min.   :  0.00  
##  1st Qu.: 15.79  
##  Median : 24.48  
##  Mean   : 30.85  
##  3rd Qu.: 38.00  
##  Max.   :188.84  
## 

Data Process

First we clear the data of missing values and re-sort the data in chronological order. 17 rows of data are omitted in the process.

## [1] "Original data rows: 564584"
## [1] "Data rows after removing missing values: 564567"
##   AirCondMod     date hour region_code model_type vehicle_num AmdTemp weather
## 1          0 2022/3/1    0      110101        ES6           9    6.22       1
## 2          0 2022/3/1    0      110102        ES6           8    5.94       1
## 3          0 2022/3/1    0      110105        ES6          42    6.33       1
## 4          0 2022/3/1    0      110106        ES6           8    6.11       1
## 5          0 2022/3/1    0      110107        ES6           3    5.17       1
## 6          0 2022/3/1    0      110108        ES6          24    6.80       1
##   humidity pm25 ComprActPwr_kwh PtcTotActPwr_kwh PwrActOfChi_kwh trip_eng_kwh
## 1       27    4               0                0               0        0.989
## 2       27    4               0                0               0        0.437
## 3       27    4               0                0               0        1.175
## 4       33    4               0                0               0        1.156
## 5       29    4               0                0               0        0.600
## 6       30    4               0                0               0        0.800
##   IntrTemp CCU_FrntLeTempSet CCU_FrntRiTempSet CCU_FrntBlwSpd duration_s
## 1    15.21             26.00             26.33           1.22     644.89
## 2    14.94             25.44             25.81           1.27     252.63
## 3    14.68             25.92             25.87           1.52     566.89
## 4    15.02             26.61             26.50           1.46     511.78
## 5     9.89             22.50             22.50           1.00     435.00
## 6    14.32             25.52             25.10           1.29     591.32
##   mileage_km VehSpd_kph
## 1       3.67      33.43
## 2       2.34      35.13
## 3       2.78      33.74
## 4      -2.27      32.21
## 5       2.67      35.25
## 6       2.81      33.67

Pie Plot Analysis for Air Conditioning Mode (categorical variable)

The above plot shows that in March, half of China’s drivers did not use air conditioning, suggesting that the climate was suitable on many days, in addition to scenarios in which air conditioning modes using heat pump technology for heating and using PTCs accounted for 32.6% of the demand, higher than the 17.4% share of the demand for cooling.

Line Graph Analysis for Vehicle Number versus Time (numerical variable)

The above data shows that numbers of vehicle traffic show a clear tidal pattern over time, specifically, peaks at 8:00 a.m. and 6:00 p.m., respectively.

Box Plot Analysis for AmdTemp (a set of variables)

According to region_code, we filtered out the data of Beijing (central province), Shanghai (southern province), and Heilongjiang (northern province), and counted the air conditioner setting temperature of all car owners during the statistical period. It is easy to see through the above box plot that the air conditioner setting temperature of the car owners in Heilongjiang region is higher, and that in Shanghai region is lower, which is also in line with geographic laws and the climatic characteristics of each place, i.e., the climate of the northern city is colder, and that of the southern city is warmer.

Histogram Analysis for CCU_FrntLeTempSet (examine the distribution of the data)

We can see several characteristics of the distribution of air conditioning set temperatures:

  • Peak: the data distribution shows a clear single peak characteristic, mainly concentrated between 20 and 30 vehicles. This indicates that most observations are clustered around a central value.

  • Symmetry: The distribution appears to be relatively symmetrical, with the peaks located roughly in the center of the distribution and a more even decline on both sides.

  • Tail Behavior: The plot shows that the tails of the data are not very long, with a gradual decrease towards 0 on both sides, indicating that extreme values are not very common.

  • Kernel Density Estimation: The kernel density estimation (orange line) in the figure matches the histogram well, indicating that the kernel density estimation method used is effective in capturing the overall trend of the data.

  • Data fluctuations: While the data are mainly concentrated in one area, there are some data points spread outside the peak area, which may indicate some fluctuations or outliers in the data.

The above histogram shows that the majority of car owners set their air conditioners between 20 and 28 degrees, and there are three peaks at 23, 24 and 25 degrees, indicating that these are the most popular temperatures.

Central Limit Theorem

We can take random samples from the original data several times, calculate the sample means, and then plot the distribution of these sample means. According to the Central Limit Theorem, regardless of the distribution of the original data, as long as the sample size is large enough, the distribution of the sample means will approximate a normal distribution.

The distribution of the sample means is close to normal, thus verifying the applicability of the central limit theorem.

Different Sampling Methods

## [1] "Full data statistics:"
##   MeanVehicle SDVehicle
## 1    13.37288  40.79825
## [1] "Simple random sample statistics:"
##   MeanVehicle SDVehicle
## 1         2.9  3.695342
## [1] "Stratified sample statistics:"
##   MeanVehicle SDVehicle
## 1    5.433235  21.36222
## [1] "Systematic sample statistics:"
##   MeanVehicle SDVehicle
## 1        17.1  31.19633

Systematic sampling appears to provide the closest results to full sample statistics in this case, possibly because its sampling method (fixed intervals) captures the major trends in the data set. Stratified sampling also demonstrates the ability to maintain data diversity, but simple random sampling may not effectively reflect the overall statistics due to insufficient sample size or too much chance.

Results

Although there may be randomness in the behavior of individual owners, a large amount of data shows an overall pattern, and in the course of our statistics on the air conditioning setting temperature, the number of vehicles, and other data, we found that the use of air conditioning in EVs is affected by region, while the number of vehicles shows tidal wave over time.